
*** Project: Analysis of Female Labor Force Participation in China
*** Author: [Steven Zhongwu Li, Fengzhi Lu]
*** Date: [23 Dec, 2025]
*** Description: This script performs an empirical analysis of female labor force participation, focusing on the impact of education advantage and related mechanisms, heterogeneity, and robustness checks.
*******************************************************************************
* --- 0. Global Settings and Setup ---
clear all                 // Clear all memory, including data, mata, and matrices
set more off              // Do not pause for --more-- messages
set maxvar 10000          // Increase maximum number of variables
set matsize 10000         // Increase maximum matrix size

* Set working directory
cd "D:\Desktop\synthetic dataset"
* --- 1. Data Loading and Variable Definitions ---
use dataset_laborforce_new.dta, clear

* Define global macro for main variables used in descriptive statistics
global desc_vars FLFP Education_advantage Education_year Urban_hukou CCP_member Age Personality_trait Health_status Internet_use Family_status Parent_CCP_member Neighbour_trust_community Gender_role_community Instrument

* Define global macro for core control variables (excluding community-level gender attitudes and regional dummies)
global x_controls Education_year Urban_hukou CCP_member Age Personality_trait Health_status Internet_use Family_status Parent_CCP_member Neighbour_trust_community Gender_role_community

* Define global macro for regional dummy variables (assuming they are named provcd1f to provcd29f)
global region_dummies provcd1f-provcd29f

* Combine core controls and regional dummies for full control set
global x_full_controls $x_controls $region_dummies

* Table 2. Descriptive Statistics ---
* Generate and export descriptive statistics table to Excel
quietly  {
	
tabstat $desc_vars, stat(N mean sd min p50 max) format(%6.3f) save
matrix s0 = r(StatTotal)
putexcel set "descriptive_statistics_raw.xlsx", replace
putexcel A1 = ("Descriptive Statistics")
putexcel A2 = matrix(s0)
putexcel clear

}

* Table 3. Variable Transformations and Regional Overview ---
* Create 'educationreversal' variable based on 'Education_advantage'
tabulate educationreversal, missing // Tabulate the distribution of 'educationreversal' with missing values

* Summarize newly created regional dummy variables (assuming they exist or are created elsewhere)
* Note: If these dummies are created later, move this section.
summarize east west central

* Table 4. Baseline Regressions with Endogeneity Treatment (CMP) ---
* Education Reversal Variable and Baseline Regressions
* Baseline CMP Regression for Female Labor Force Participation (FLFP)
 cmp setup
 
quietly {
	cmp (FLFP: FLFP = Education_advantage $x_full_controls) (Education_advantage: Education_advantage = Instrument $x_full_controls), ///
    ind($cmp_probit $cmp_cont) nolr qui tech(dfp) vce(cluster countyid14f)
est store baseline_cmp

* Export baseline CMP results
esttab baseline_cmp using "table3_baseline_cmp.rtf", replace ///
    title("Table 4: Baseline Regression Results (CMP)") ///
    pr2 b(%9.3f) se ar2 ///
    star(* 0.11 ** 0.05 *** 0.01) ///
    keep(Education_advantage Instrument) // Keep relevant coefficients
}

* First Stage Regression and Overidentification Test (for Instrumental Variables)
ivreg2 FLFP $x_full_controls (Education_advantage = Instrument), first cluster(countyid14f)
display "First-stage F-statistic: `e(firstf)'" // Display the first-stage F-statistic

* Reduced Form Regression Result (Probit)
quietly {
	
probit FLFP Instrument $x_full_controls, vce(cluster countyid14f)
est store reduced_form_probit
esttab reduced_form_probit using "table3_reduced_form.rtf", replace ///
    title("Table 4: Reduced Form Probit Regression") ///
    pr2 b(%9.3f) se ar2 ///
    star(* 0.11 ** 0.05 *** 0.01) ///
    keep(Instrument)
}

* Table 5. Mechanism Analysis: Gender Display Theory ---
* Generate interaction terms
quietly {
	
gen Interact1 = Education_advantage * Gender_role_community // Interaction: education advantage * community gender roles
gen Interact2 = Education_advantage * Gender_ideology       // Interaction: education advantage * individual gender ideology

* CMP Regression with Interaction Terms
cmp (FLFP: FLFP = Education_advantage $x_full_controls Interact1 Interact2) (Education_advantage: Education_advantage = Instrument $x_full_controls Interact1 Interact2), ///
    ind($cmp_probit $cmp_cont) nolr qui tech(dfp) vce(cluster countyid14f)
est store mechanism_model_1

* CMP Regression including Labor Market Restriction Controls
	cmp (FLFP: FLFP = Education_advantage $x_full_controls Occupation_segregation Gender_wagegap Bargain_power Interact1 Interact2) (Education_advantage: Education_advantage = Instrument $x_full_controls Occupation_segregation Gender_wagegap Bargain_power Interact1 Interact2), ///
    ind($cmp_probit $cmp_cont) nolr tech(dfp) iter(500) vce(cluster countyid14f)
est store mechanism_model_2

* Export Mechanism Analysis results
esttab mechanism_model_1 mechanism_model_2 using "table5_mechanism_analysis.rtf", replace ///
    title("Table 5: Mechanism Analysis - Gender Display Theory") ///
    pr2 b(%9.3f) se ar2 ///
    star(* 0.1 ** 0.05 *** 0.01) ///
    keep(Interact1 Interact2) // Focus on interaction terms
}


* Table 6. Regression of Individual Gender Ideology on Community Gender Roles ---
quietly {
	
reg Gender_ideology Gender_role_community $x_full_controls, vce(cluster countyid14f)
est store gender_ideology_reg
esttab gender_ideology_reg using "table6_gender_ideology_reg.rtf", replace ///
    title("Table 6: Regression of Individual Gender Ideology on Community Gender Roles") ///
    ar2 b(%9.3f) se ///
    star(* 0.1 ** 0.05 *** 0.01) ///
    keep(Gender_role_community)
}

* Heterogeneity Analysis ---
* Table 7. Heterogeneity by Age Group
quietly {
	
summarize Age, detail // Understand age distribution for cutoff
* Regression for Younger Group (Age <= 49)
cmp (FLFP: FLFP = Education_advantage $x_full_controls) (Education_advantage: Education_advantage = Instrument $x_full_controls) if Age <= 49, ///
    ind($cmp_probit $cmp_cont) nolr qui tech(dfp) vce(cluster countyid14f)
est store age_group_young

* Regression for Older Group (Age > 49)
cmp (FLFP: FLFP = Education_advantage $x_full_controls) (Education_advantage: Education_advantage = Instrument $x_full_controls) if Age > 49, ///
    ind($cmp_probit $cmp_cont) nolr qui tech(dfp) vce(cluster countyid14f)
est store age_group_old
	
****Heterogeneity by Number of Children
summarize n_childf, detail // Understand age distribution for cutoff
* Regression for more children group
cmp (FLFP: FLFP = Education_advantage $x_full_controls) (Education_advantage: Education_advantage = Instrument $x_full_controls) if n_childf>1, ///
    ind($cmp_probit $cmp_cont) nolr qui tech(dfp) vce(cluster countyid14f)
est store more_child

* Regression for less children group
cmp (FLFP: FLFP = Education_advantage $x_full_controls) (Education_advantage: Education_advantage = Instrument $x_full_controls) if n_childf<=1, ///
    ind($cmp_probit $cmp_cont) nolr qui tech(dfp) vce(cluster countyid14f)
est store less_child

* Export age group heterogeneity results
esttab age_group_young age_group_old more_child less_child using "table7_heterogeneity_age_children.rtf", replace ///
    title("Table 7: Heterogeneity Analysis by Age _child Group") ///
    pr2 b(%9.3f) se ar2 ///
    star(* 0.1 ** 0.05 *** 0.01) ///
    keep(Education_advantage)
}

* Table 8. Interaction with Education Level (Female's own education), Family Social Status, Number of Children, and Husband's Education
quietly {
	
gen Interact3 = Education_advantage * Education_year // Interaction: education advantage * own education year
gen Interact4 = Education_advantage * Family_status // Interaction: education advantage * family status
gen Interact5 = Education_advantage * Men_education // Interaction: education advantage * husband's education

* CMP Regression with own education interaction
cmp (FLFP: FLFP = Education_advantage $x_full_controls Interact3) (Education_advantage: Education_advantage = Instrument $x_full_controls Interact3), ///
    ind($cmp_probit $cmp_cont) nolr qui tech(dfp) vce(cluster countyid14f)
est store h_own_edu

* CMP Regression with family status interaction
cmp (FLFP: FLFP = Education_advantage $x_full_controls Interact4) (Education_advantage: Education_advantage = Instrument $x_full_controls Interact4), ///
    ind($cmp_probit $cmp_cont) nolr qui tech(dfp) vce(cluster countyid14f)
est store h_family_status	

* CMP Regression with husband's education interaction
cmp (FLFP: FLFP = Education_advantage $x_full_controls Interact5) (Education_advantage: Education_advantage = Instrument $x_full_controls Interact5), ///
    ind($cmp_probit $cmp_cont) nolr qui tech(dfp) vce(cluster countyid14f)
est store h_husband_edu

* Export results for these interactions
esttab h_own_edu h_family_status h_husband_edu using "table8_heterogeneity_interactions.rtf", replace ///
    title("Table 8: Heterogeneity Analysis with Interaction Terms") ///
    pr2 b(%9.3f) se ar2 ///
    star(* 0.1 ** 0.05 *** 0.01) ///
    keep(Interact3 Interact4 Interact5)
}

* Heterogeneity by Regional Female Labor Force Participation Rate
quietly {
	
use "2014dataset_laborforce_new.dta", clear
gen Interact6 = Education_advantage * Regional_FLFP // Interaction: education advantage * regional FLFP
cmp (FLFP: FLFP = Education_advantage $x_full_controls Interact6 Regional_FLFP) (Education_advantage: Education_advantage = Instrument $x_full_controls Interact6 Regional_FLFP), ///
    ind($cmp_probit $cmp_cont) nolr qui tech(dfp) vce(cluster countyid14f)
est store h_regional_flfp
esttab h_regional_flfp using "table8_heterogeneity_regional_flfp.rtf", replace ///
    title("Table 8 Continued: Heterogeneity by Regional FLFP Rate") ///
    pr2 b(%9.3f) se ar2 ///
    star(* 0.1 ** 0.05 *** 0.01) ///
    keep(Interact6)
}

* Table 9. Extension: Impact on Male Labor Force Participation ---
quietly {
	
use "dataset_laborforce_new.dta", clear
* CMP Regression for FLFP using Husband_education_advantage_m
cmp (FLFP: FLFP = Husband_education_advantage_m $x_full_controls) (Husband_education_advantage_m: Husband_education_advantage_m = Instrument $x_full_controls), ///
    ind($cmp_probit $cmp_cont) nolr qui tech(dfp) vce(cluster countyid14f)
est store flfp_husband_edu_adv
esttab flfp_husband_edu_adv using "table9_flfp_husband_edu_adv.rtf", replace ///
    title("Table 9: Impact of Husband's Education Advantage on Female LFP") ///
    pr2 b(%9.3f) se ar2 ///
    star(* 0.11 ** 0.05 *** 0.01) ///
    keep(Husband_education_advantage_m)

* Load male-specific dataset for Male LFP analysis
use "dataset_laborforce_new_men.dta", clear

* Define male-specific control variables
global x_male_controls Men_education Urban_hukou_m CCP_member_m Age_m Personality_trait_m Health_status_m Internet_use_m Family_status_m Parent_CCP_member_m Neighbour_trust_community_m provcd1m-provcd29m

* CMP Regression for Male Labor Force Participation (MLFP)
cmp (MLFP_m: MLFP_m = Educational_differential_m $x_male_controls) (Educational_differential_m: Educational_differential_m = Instrument $x_male_controls), ///
    ind($cmp_probit $cmp_cont) nolr qui tech(dfp) vce(cluster countyid14mm)
est store mlfp_edu_diff
esttab mlfp_edu_diff using "table9_mlfp_edu_diff.rtf", replace ///
    title("Table 9: Impact of Educational Differential on Male LFP") ///
    pr2 b(%9.3f) se ar2 ///
    star(* 0.11 ** 0.05 *** 0.01) ///
    keep(Educational_differential_m)
}

*  Robustness Checks ---
* Appendix 1. Merging Community Data and Alternative IV Methods

* Robustness Check 1: Control for additional community-level factors
quietly {
	
use "2014dataset_laborforce_new_community.dta", clear // Load dataset with community data
* Baseline CMP (re-run for comparison with new dataset)
cmp (FLFP: FLFP = Education_advantage $x_full_controls) (Education_advantage: Education_advantage = Instrument $x_full_controls), ///
    ind($cmp_probit $cmp_cont) nolr qui tech(dfp) vce(cluster countyid14f)
est store rc_cmp_baseline

* CMP with additional community controls and individual gender ideology
cmp (FLFP: FLFP = Education_advantage $x_full_controls Migration_rate Income_percapita Gender_ideology) (Education_advantage: Education_advantage = Instrument $x_full_controls Migration_rate Income_percapita Gender_ideology), ///
    ind($cmp_probit $cmp_cont) nolr qui tech(dfp) vce(cluster countyid14f)
est store rc_cmp_community_controls

esttab rc_cmp_baseline rc_cmp_community_controls using "appendix1_robustness_community_controls.rtf", replace ///
    title("Appendix 1: Robustness Checks - Community Controls") ///
    pr2 b(%9.3f) se ar2 ///
    star(* 0.11 ** 0.05 *** 0.01) ///
    keep(Education_advantage)
}

* Appendix 2. Robustness Check 2: Lewbel's IV Method
use "dataset_laborforce_new.dta", clear
* Lewbel's IV Method (using ivreg2h for heteroskedasticity-robust standard errors)
* 1. Rename regional dummy variables: provcd1f–provcd29f → xx1–xx29
*-------------------------------------------------------------*
forvalues i = 1/29 {
    capture confirm variable provcd`i'f
    if !_rc {
        gen xx`i'= provcd`i'f 
    }
}

* Create global macro for region dummies
global region_dummies xx1-xx29

* 2. Generate standardized control variables x1–x10
gen x1  = Education_year
gen x2  = Urban_hukou
gen x3  = CCP_member
gen x4  = Personality_trait
gen x5  = Health_status
gen x6  = Internet_use
gen x7  = Family_status
gen x8  = Parent_CCP_member
gen x9  = Neighbour_trust_community
gen x10 = Gender_role_community

global x_full_controls1 x1 x2 x3 Age x4 x5 x6 x7 x8 x9 x10 $region_dummies

ivreg2h FLFP ///
    $x_full_controls1 ///
    (Education_advantage = Instrument), ///
    small robust gmm2s ///
    partial($x_full_controls1) ///
    cluster(countyid14f)

outreg2 using iv_results.doc, ///
    replace ///
    ctitle("Appendix 2 Lewbel's IV'") ///
    dec(3)
	
* Standard OLS regression for heteroskedasticity tests
quietly {
	
	reg FLFP Education_advantage $x_full_controls1
}

estat hettest // Breusch-Pagan test for heteroskedasticity
ivhettest     // Test for heteroskedasticity in IV regression (if ivreg2 was used)


* Appendix 3. Robustness Check 3: Alternative Dependent and Explanatory Variables
* Reconstruct labor force participation rate based on work nature
use "dataset_laborforce_new.dta", clear

quietly {
* CMP Regression using 'Alternative_DV' as the dependent variable
cmp (Alternative_DV: Alternative_DV = Education_advantage $x_full_controls) (Education_advantage: Education_advantage = Instrument $x_full_controls), ///
    ind($cmp_probit $cmp_cont) nolr qui tech(dfp) vce(cluster countyid14f)
est store rc_alternative_dv

* Using `moreedu2014` as the core explanatory variable
cmp (FLFP: FLFP = Education_advantage_level $x_full_controls) (Education_advantage_level: Education_advantage_level = Instrument $x_full_controls), ///
    ind($cmp_probit $cmp_cont) nolr qui tech(dfp) vce(cluster countyid14f)
est store rc_edu_level_2014

* Using 2012 variables as explanatory variables
cmp (FLFP: FLFP = Education_advantage_2012 $x_full_controls) (Education_advantage_2012: Education_advantage_2012 = Instrument $x_full_controls), ///
    ind($cmp_probit $cmp_cont) nolr qui tech(dfp) vce(cluster countyid14f)
est store rc_edu_level_2012

esttab rc_alternative_dv rc_edu_level_2014 rc_edu_level_2012 using "appendix3_robustness_alternative_vars.rtf", replace ///
    title("Appendix 3: Robustness Checks - Alternative Dependent and Explanatory Variables") ///
    pr2 b(%9.3f) se ar2 ///
    star(* 0.10 ** 0.05 *** 0.01) ///
    keep(Education_advantage Education_advantage_level Education_advantage_2012)
}

* Data Comparison: CFPS 2020 vs CFPS 2014 ---
* Appendix 4. Comparison of Survey Waves
quietly {
	
use "2020-2014cfps_laborforce_new.dta", clear // Load the final longitudinal dataset

estpost summarize FLFP_2014 FLFP_2020 Education_advantage_2014 Education_advantage_2020 Gender_ideology_2014 Gender_ideology_2020

esttab using "Appendix4_descriptive_stats.rtf", replace ///
    cell(mean(fmt(3)) sd(fmt(3)) min(fmt(3)) max(fmt(3))) ///
    nomtitles nonumbers noobs ///
    title("Appendix 4") ///
    label onecell
	
	}
	
	

* End of script
*******************************************************************************